Enhanced word classing for model M

نویسندگان

  • Stanley F. Chen
  • Stephen M. Chu
چکیده

Model M is a superior class-based n-gram model that has shown improvements on a variety of tasks and domains. In previous work with Model M, bigram mutual information clustering has been used to derive word classes. In this paper, we introduce a new word classing method designed to closely match with Model M. The proposed classing technique achieves gains in speech recognition word-error rate of up to 1.1% absolute over the baseline clustering, and a total gain of up to 3.0% absolute over a Katz-smoothed trigram model, the largest such gain ever reported for a class-based language model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced Word Classing for Recurrent Neural Network Language

Recurrent Neural Network Language Model (RNNLM) has recently been shown to outperform conventional N-gram LM as well as many other competing advanced language model techniques. However, the computation complexity of RNNLM is much higher than the conventional N-gram LM. As a result, the Class-based RNNLM (CRNNLM) is usually employed to speed up both the training and testing phase of RNNLM. In pr...

متن کامل

A Study of Word-Classing for MT Reordering

MT systems typically use parsers to help reorder constituents. However most languages do not have adequate treebank data to learn good parsers, and such training data is extremely time-consuming to annotate. Our earlier work has shown that a reordering model learnt from word-alignments using POS tags as features can improve MT performance (Visweswariah et al., 2011). In this paper, we investiga...

متن کامل

Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation

Arabic words are often ambiguous between name and non-name interpretations, frequently leading to incorrect name translations. We present a technique to disambiguate and transliterate names even if name interpretations do not exist or have relatively low probability distributions in the parallel training corpus. The key idea comprises named entity classing at the preprocessing step, decoding of...

متن کامل

Modeling of Nanofiltration for ‎Concentrated Electrolyte Solutions using ‎Linearized Transport Pore Model

   In this study, linearized transport pore model (LTPM) is applied for modeling nanofiltration (NF) membrane separation process. This modeling approach is based on the modified extended Nernst-Planck equation enhanced by Debye-Huckel theory to take into account the variations of activity coefficient especially at high salt concentrations. Rejection of single-salt (NaCl) electrolyte is inve...

متن کامل

Multispectral Image Classification Using Back-propagation Neural Network in Pca Domain

Recently, in classification of multispectral remote resensing image by using back-propagation neural network (BPNN), all bands of image must be used for training and classing. Disadvantage of the mentioned method not only requires more time for training and classing but also more complexity. In this paper, to decrease the mentioned disadvantage, principal component analysis (PCA) is applied to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010